perm filename LIB.PRO[1,JMC] blob
sn#005297 filedate 1971-02-17 generic text, type T, neo UTF8
00100 PROPOSAL FOR
00200
00300 A COMPUTER SCIENCE LIBRARY ON THE LASER FILE
00400
00500 by John McCarthy, Stanford University
00600
00700 The ARPA 10↑12 bit file provides the first opportunity to
00800 store a library in machine usable form at reasonable cost. Namely,
00900 the capital cost of storage on the file is 10↑-6 dollars per bit; a
01000 100,000 word book is about 4x10↑6 bits, so the capital cost of
01100 storing it is $4.00. This is considerably less than the cost of the
01200 book and the storage space for it in a conventional library.
01300
01400 We envisage a library containing all important computer
01500 science and technology books, journals and reports; this would total
01600 between 2000 and 10,000 of the above-mentioned 100,000 word volumes,
01700 so that we estimate the storage costs as between $8000 and $40,000
01800 not resulting in any contribution to the immediate expense of the
01900 project since the file is already committed.
02000
02100 We envisage the file being read through display consoles.
02200 According to a survey at the ARPA IPT contractor's meeting about 100
02300 suitable consoles are already in use or soon will be. This may be
02400 optimistic as not all the consoles may be suitable for reading.
02500 Someone who is reading at 600 words per minute will use approximately
02600 400 bits/second of the network's communication capacity which is
02700 1/125 th of one channel. Naturally, this will be used in bursts of
02800 page length. (To transmit a 1000 word page will take .8 seconds).
02900 Thus, the network's capacity will not be strained even in the
03000 unlikely event that the library turns on all the members of all the
03100 projects to reading the literature. Browsing and the use of
03200 information retrieval programs would increase the data rates
03300 required, but clearly experimental use cannot strain the network.
03400
03500 A substantial library of reports can be created without
03600 worrying about copyright restrictions, but I think we can and should
03700 try to get publisher's permission to include their material. The
03800 limited number of users will clearly not cost them much sales and
03900 getting the material in machine usable form will, in the long run,
04000 more than counterbalance this. There is, however, much to be said
04100 for negotiating a royalty agreement based on the amount of usage of
04200 the material as measured by the system. This can serve as a
04300 prototype for such agreements in the future.
04400
04500 The facility should be regarded as a library and not, in
04600 itself, as an information retrieval system. Naturally, computerised
04700 information retrieval systems can use the library, and documents
04800 created by such work can be included in the library, but the library
04900 itself should not be committed to any particular approach to
05000 information retrieval. The extreme of this is that each document has
05100 a number and every other kind of lookup must be accomplished with the
05200 help of programs that use auxiliary documents such as catalogs and
05300 indexes and bibliographies. In practice, there would have to be at
05400 least one librarian organization supported by ARPA to assure at least
05500 minimum facilities.
05600
05700 Defense Department use for such systems goes, of course, far
05800 beyond the computer science area, but computer science and technology
05900 is the right place to start because the consoles exist, the network
06000 exists, and the propensity to use such a system exists. We envisage
06100 DoD systems coming along about two years after the computer science
06200 system is demonstrated. It might be worthwhile, however, to begin
06300 work early on the cryptography certification required to allow the
06400 file to be used for classified material.
06500
06600 The major problem with such a library is getting a large body
06700 of material in computer usable form. According to Dan Forsyth of
06800 Information International, present key-punching costs run $.75 to
06900 $1.00 per thousand characters. At this rate, the cost of entering
07000 the proposed computer science and technology library would be between
07100 $900,000 and $6,000,000. He estimates that his company's optical
07200 character recognition system might allow a contract costing between
07300 $.10 and $.25 per 1000 characters. This comes to between $120,000
07400 and $1,500,000. Presumably, these uncertainties in the size of the
07500 collection required and the costs of converting it could be reduced
07600 rather quickly. It is scarcely necessary to point out that there are
07700 may commercial approaches to optical character recognition, but the
07800 capacity of these approaches to handle the required wide variety of
07900 fonts has to be looked into. Some combination of R&D contracts and
08000 fixed price contracts is the most likely way of getting the work
08100 done.
08200
08300 Much printed material is already prepared in machine readable
08400 form for the use of the printers. It will take quite an effort,
08500 however, to get all that material read into a computer.
08600
08700 Considerable standardization effort will be required to
08800 devise a good system of representing text in various fonts and
08900 illustrations in computer memory and to make a system for displaying
09000 them, printing hard copy and making micro-fiches.
09100
09200 The file system is scheduled to be operational at Ames
09300 Research Center in the spring of 1972. We believe that the costs
09400 could be estimated and contracts let during the summer of 1971 and
09500 the first documents go into the system and programs for reading them
09600 be available in TENEX about September 1972.
09700
09800 The Stanford Artificial Intelligence Project already has
09900 facilities for keeping documents in the computer and displaying them,
10000 and these features are in use. Engelbart's group at SRI has similar
10100 experience and a charter that is perhaps somewhat closer to
10200 maintaining a library. It is likely that Stanford AI and SRI could
10300 collaborate in the early stages of estimating costs and determining a
10400 good way of carrying out the project.